Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs
نویسندگان
چکیده
Many datasets including social media data and bibliographic can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of data. To improve quality clustering, node attributes taken account, resulting in attributed Existing graph clustering methods generally consider attribute similarity structural separately. In this paper, we represent star-schema heterogeneous graphs, where are different types nodes. This enables use personalized pagerank (PPR) a unified distance measure that captures both similarities. We employ DBSCAN for update edge weights iteratively balance importance attributes. The rapidly growing volume nowadays challenges traditional algorithms, thus, distributed method required. Hence, adopt popular computing system Blogel, based on which, develop four exact approximate approaches enable efficient PPR score computation when updated. effectiveness propose simple yet effective weight strategy entropy. addition, present game theory trading efficiency result quality. Extensive experiments real-life offer our proposals.
منابع مشابه
Distributed Clustering on Graphs
This paper provides new algorithms for distributed clustering for two popular center-based objec-tives, k-median and k-means. These algorithms have provable guarantees and improve communicationcomplexity over existing approaches. Following a classic approach in clustering by [13], we reduce theproblem of finding a clustering with low cost to the problem of finding a ‘coreset’ of...
متن کاملClustering with Proximity Graphs: Exact and Efficient Algorithms
Graph Proximity Cleansing (GPC) is a string clustering algorithm that automatically detects cluster borders and has been successfully used for string cleansing. For each potential cluster a so-called proximity graph is computed, and the cluster border is detected based on the proximity graph. However, the computation of the proximity graph is expensive and the state-of-the-art GPC algorithms on...
متن کاملEfficient Evolutionary Algorithms for the Clustering Problem in Directed Graphs
This paper presents improvements in the performance of standard genetic algorithms (GAs) as regards the solution of highly complex combinatorial optimization problems. These improvements are related to some modifications in the GA, including local search and/or diversification procedures. The performance of each proposed version is evaluated through a graph partitioning problem. Extensive compu...
متن کاملGeneral and Robust Communication-Efficient Algorithms for Distributed Clustering
As datasets become larger and more distributed, algorithms for distributed clustering have become more and more important. In this work, we present a general framework for designing distributed clustering algorithms that are robust to outliers. Using our framework, we give a distributed approximation algorithm for k-means, k-median, or generally any `p objective, with z outliers and/or balance ...
متن کاملSurvey on Variants of Distributed Energy efficient Clustering Protocols in heterogeneous Wireless Sensor Network
Wireless sensor networks are composed of low cost and extremely power constrained sensor nodes which are scattered over a region forming self organized networks, making energy consumption a crucial design issue. Thus, finite network lifetime is widely regarded as a fundamental performance bottleneck. These networks are used for various applications such as field monitoring, home automation, med...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2022
ISSN: ['1558-2191', '1041-4347', '2326-3865']
DOI: https://doi.org/10.1109/tkde.2020.3047631